Skip to content

Add repo-controlled robots.txt allowing AI crawlers#43

Draft
TaprootFreak wants to merge 1 commit into
mainfrom
feat/robots-allow-ai-crawlers
Draft

Add repo-controlled robots.txt allowing AI crawlers#43
TaprootFreak wants to merge 1 commit into
mainfrom
feat/robots-allow-ai-crawlers

Conversation

@TaprootFreak

Copy link
Copy Markdown

Summary

Adds a version-controlled robots.txt that serves as the authoritative crawl policy for the JuiceDollar documentation site (docs.juicedollar.com). The site is public documentation, and we explicitly want both search engines and AI agents to crawl, index, and learn from it.

What it does

  • Wildcard group allows all user-agents (Allow: /) and sets a Content-Signal granting search=yes, ai-input=yes (AI input / RAG), and ai-train=yes (AI training). We deliberately do not signal ai-train=no.
  • Named AI crawlers are additionally listed (ClaudeBot, GPTBot, Google-Extended, CCBot, Bytespider, Amazonbot, Applebot-Extended, meta-externalagent), since some honor only their own named record in addition to the wildcard group.

Placement

The file lives in src/.vuepress/public/, which VuePress copies verbatim to the published site root, so it is served at /robots.txt. Verified with a local npm run build: the file appears byte-identical at src/.vuepress/dist/robots.txt.

Sitemap

No Sitemap: directive is included: the build does not generate a sitemap.xml (no sitemap plugin is configured) and https://docs.juicedollar.com/sitemap.xml currently returns 404. The directive can be added later if a sitemap is introduced.

Notes

The site is published via Cloudflare Pages, which serves repository static assets as-is; no platform-specific configuration is required for this file.

Add a version-controlled robots.txt that serves as the authoritative
crawl policy for the JuiceDollar documentation site
(docs.juicedollar.com). The file explicitly welcomes both search
engines and AI agents to crawl, index, and learn from this public
documentation.

- Wildcard group allows all user-agents and sets a Content-Signal
  granting search, AI input / RAG, and AI training.
- Major AI crawlers (ClaudeBot, GPTBot, Google-Extended, CCBot,
  Bytespider, Amazonbot, Applebot-Extended, meta-externalagent) are
  additionally listed by name, since some honor only their own record.

The file lives in src/.vuepress/public/, which VuePress copies verbatim
to the published site root (dist/robots.txt). No Sitemap directive is
included because the site does not currently publish a sitemap.xml.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant